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Abstract. Standard models of galaxy formation predict that matter distribution is 
statistically homogeneous and isotropic and characterized by (i) spatial homogeneity 
for r < 10 Mpc/h, (ii) small-amplitude structures of relatively limited size (i.e., 
r < 100) Mpc/h and (iii) anti-correlations for r > r c « 150 Mpc/h (i.e., no 
structures of size larger than r c ). Whether or not the observed galaxy distribution is 
interpreted to be compatible with these predictions depend on the a-priori assumptions 
encoded in the statistical methods employed to characterize the data and on the a- 
posteriori hypotheses made to interpret the results. We present strategies to test 
the most common assumptions and we find evidences that, in the available samples, 
galaxy distribution is spatially inhomogeneous for r < 100 Mpc/h but statistically 
homogeneous and isotropic. We conclude that the observed inhomogeneities pose a 
fundamental challenge to the standard picture of cosmology but they also represent an 
important opportunity which may open new directions for many cosmological puzzles. 



PACS numbers: 98.65.-r,98.65.Dx,98.80.-k 
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1. Introduction 

One of cornerstone of modern cosmology is represented by the observations of the three 
dimensional distribution of galaxies [TJ [2] . In recent years the extraordinary increase 
of the number of redshifts has allowed us to characterize in detail galaxy structures 
at low redshifts (i.e., z < 0.3) and small scales (i.e., r < 150 Mpc/h). Many authors 
(see e.g., [31 HJ El El [TJ [8j [9]) have concluded that the results of a statistical analysis 
of the data are compatible with the theoretical expectations of standard scenarios of 
galaxy as the Cold Dark Matter (CDM) model and its variants (i.e., the case in which 
the cosmological constant is non zero or LCDM). However, there are some important 
methodological issues which have not received the due attention [101 HH [121 EH f!5| [16] . 
In particular, the critical points concern the a-priori assumptions which are usually used, 
without being directly tested, in the statistical analysis of the data and the a-posteriori 
hypotheses that are invoked to interpret the results. 

Among the former, there are the assumptions of spatial homogeneity and of 
translational and rotational invariance (i.e., statistical homogeneity) which are built 
in the definition of the standard estimators of galaxy correlations [17]. While these 
estimators are certainly the correct ones to use when statistical and spatial homogeneity 
are verified, it is not simply evident that galaxy data do satisfy these properties 
in the available samples. It is indeed well known that galaxies are organized into 
a network of structures, like clusters, filaments and voids, with large fluctuations 
|THl [THl [201 EH [221 [23] and it is not a-priori obvious that spatial or statistical homogeneity 
are satisfied in a sample of arbitrary small size. 

The observed galaxy distribution is found to be inhomogeneous at small scales 
while, according to theoretical models it is expected to become spatially homogeneous 
for r > A ~ 10 Mpc/h (see, e.g., [21]): this scale can be easily calculated by considering 
how the scale at which fluctuations are order of the mean evolves according to linear 
perturbation theory of a self-gravitating fluid[25]. The scale A , a key theoretical 
prediction which must be confronted with the data, is usually determined only indirectly 
by using statistical methods which assume a-priori spatial homogeneity. When the given 
finite sample distribution is not spatially homogeneous the results of the analysis are very 
misleading [17J. Therefore, in order to test directly whether a distribution is spatially 
homogeneous it is necessary to introduce more general statistical methods than the usual 
ones [171 [TT]. These methods consider explicitly the problem of the stability of finite 
sample determinations: if a statistical quantity depends on the sample size then it is 
affected by large fluctuations and/or by observational systematic effects; in both cases it 
does not represent a meaningful and useful estimator of an ensemble average property. 
A critical analysis of finite-sample volume averages is thus necessary to identify the 
subtle effects induced by spatial inhomogeneities and to distinguish them from other 
intervening systematic effects. 

As mentioned above, a second kind of ad-hoc hypotheses are often used in the 
interpretation of the results of the statistical analysis. These are invoked when 
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one finds results which are a-priori unexpected and which clearly show that some 
of the basic assumptions encoded in the used statistical methods are not verified in 
the data. Examples are galaxy evolution, luminosity bias, or selection effects due to 
some observational issues. It is plausible that some of these may affect the results 
of a statistical analysis; however, in the absence of a quantitative prediction or of an 
independent estimation of these effects, one must use several assumptions (e.g., specific 
functional behavior or arbitrary values for a set of parameters, etc.) [261 123, H] without 
any clean test of their validity. A different strategy, which, when possible, we adopt 
here, is to develop focussed tests to understand whether the quantitative influence of 
the intervening systematic effects are supported by the given data. 

Theoretical models predict the matter density field properties both in the early 
and in the late universe. Fluctuations and correlations must have very specific 
properties. Firstly, the Friedmann- Robert son- Walker (FRW) geometry is derived under 
the assumption that matter distribution is exactly translational and rotational invariant, 
i.e. that the matter density is assumed to be constant in a spatial hyper- surface. On the 
top of the mean field one can consider statistically homogeneous and isotropic small- 
amplitude fluctuations [28]. These furnish the seeds of gravitational clustering which 
eventually give rise to the structures we observe in the present universe. 

Secondly, the statistical properties of matter density fluctuations have to satisfy 
an important condition in order to be compatible with the FRW geometry [291 130] . In 
its essence, the condition is that fluctuations in the gravitational potential induced by 
density fluctuations do not diverge at large scales [3TJ HTJ [33] . This situation requires 
that the matter density field fluctuations must decay in the fastest possible way with 
scale [32] . Correspondingly the two-point correlation function becomes negative at larger 
scales (i.e., r > 150 Mpc/h) which implies the absence of larger structures of tiny density 
fluctuations. Are the observed large scale structures and fluctuations compatible with 
such a scenario ? 

This paper is organized as follows. In Sect J2] we briefly review the main properties 
of both spatially homogeneous and inhomogeneous stochastic density fields. The main 
features of real space correlation properties of standard cosmological density fields are 
presented in SectJU In the case of a finite-sample distribution (SectH]) the information 
that can be exacted from the data is through a statistical analysis, and hence through the 
computation of volume averages. We discuss how to set up a strategy to analyze a point 
distribution in a finite volume, stressing the sequence of steps that should be considered 
in order to reduce as much as possible the role of a-priori assumptions encoded in the 
statistical analysis and to correctly interpret the meaning of the measured volume 
averages. The analysis of the galaxy data is presented in Sectj5j We show that galaxy 
distribution, at relatively low redshifts (i.e., z < 0.3) and small scales (i.e., r < 150 
Mpc/h) is characterized by large density fluctuations which correspond to large-scale 
correlations. We emphasis that by using the standard statistical tools one reaches a 
different conclusion. This occurs because these methods are based on several important 
assumptions: some of them, when directly tested are not verified, while others are very 
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strong ad-hoc hypotheses which require a detailed investigation. Finally in SectJH] we 
draw our main conclusions. 

2. A brief review of the main statistical properties 

Before entering in the problems related to the statistical characterization in finite 
samples, we review the main probabilistic properties of mass density fields. This means 
that we consider ensemble averages or, for ergodic cases, volume averages in the infinite 
volume limit. 

A mass density field can be represented as a stationary stochastic process that 
consists in extracting the value of the microscopic density function p(rjj] at any point of 
the space. This is completely characterized by its probability density functional V[p(f)]. 
This functional can be interpreted as the joint probability density function (PDF) of the 
random variables p(r) at every point r. If the functional V[p(f)} is invariant under spatial 
translations then the stochastic process is statistically homogeneous or translational 
invariant (stationary) [17J. When V[p(r)] is also invariant under spatial rotation then 
the density field is statistically isotropic [T7] . 

A crucial assumption usually used, when comparing theoretical prediction to data, 
is that stochastic fields are required to satisfy spatial ergodicity. Let us take a generic 
observable T = J r (p(r 1 ), p(r 2 ), ...) function of the mass distribution p(f) at different 
points in space fi,r 2 , ... . Ergodicity implies that (J 7 ) = T = limy^oo Ty , where the 
symbol (...) is for the (ensemble) average over different realizations of the stochastic 
process, and Ty = h J v TdV is the spatial average in a finite volume V [TTj . 

2.1. Spatially homogeneous distributions 

The condition of spatial homogeneity {uniformity) is satisfied if the ensemble average 
density of the field po = (p) is strictly positive, i.e. for an ergodic stochastic field [17] . 



where R is the linear size of a volume V with center in xq. Note that it is necessary to 
carefully test spatial homogeneity before applying the definitions given in this section 
to a finite sample distribution (see Sect H]). Indeed, for inhomogeneous distributions 
the estimation of the average density substantially differs from its asymptotic value and 
thus the sample estimation of p is biased by finite size effects. Unbiased tests of spatial 
homogeneity can be achieved by measuring conditional properties (see below). 

\ We use the symbol p(r) for the microscopic mass density and n(r) for the microscopic number density. 
However in the following sections we consider only the number density, as it is usually done in studies 
of galaxy distributions. In that case we can simply replace the symbol p{r) with n(r) and all the 
definitions given in this section remain unchanged. 




(1) 
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d 3 xp(x) — po 



< Po Vi?> Ac ,Vx • (2) 



A distribution is spatially inhomogeneous up to a scale A , i.e. or spatially 
homogeneous for r > A , if [UJ 

1 

V(R] X ) Jv(R;xo) 

This equation defines the homogeneity scale A which separates the strongly fluctuating 
regime r < Xq from the regime where fluctuations have small amplitude relative to the 
asymptotic average. 

Let us now discuss the characterization of two-point correlation properties. 
The quantity {p{rX)p{f 2 ))dV\dV 2 gives the a-priori probability to find two particles 
simultaneously placed in the infinitesimal volumes dVi,dV 2 respectively around f\,f* 2 . 
The quantity 

{p{r 12)) P dV 1 dV 2 = dV x dV 2 (3) 

Po 

gives the a-priori probability of finding two particles placed in the infinitesimal volumes 
dVi, dV 2 around f\ and f 2 with the condition that the origin of the coordinates is occupied 
by a particle (EqfJ] is the ratio of unconditional quantities, and thus, for the roles of 
conditional probabilities, it defines a conditional quantity) [IT] . 

For a stationary and spatially homogeneous distribution (i.e., po > 0), we may 
define the reduced two-point correlation function as [UJ 

C(r 12 ) = (p(ri2))p - 1 = Mg» - 1 , (4) 
Po Po 

This function characterizes two-point correlation properties of small amplitude density 
fluctuations. When spatial homogeneity has already been proved there are several 
useful information that can be extracted from £(r), and in particular one or a few 
characteristic length scales. For instance, the correlation length typically corresponds 
to an exponential decay of £(r) of the type £(r) ~ exp(— r/r c ) [17] . 

The two-point correlation function defined by Eq|4] is simply related to the 
normalized mass variance in a volume V(R) of linear size R [T7] 

(M( R) 2 } - (M(R)Y 

(M(R)} 2 V 2 (R) Jv(R) " ' 1 Jv(R) 

The scale r* at which fluctuations are of the order of the mean, i.e. cr(r^) = 1, is 
proportional to the scale tq at which £(r ) = 1 and to the scale Ao defined in Eqj2j 

For spatially uniform systems, when the volume V in Eqj5] is a real space sphere 
it is possible to proceed to the following classification for the scaling behavior of the 
normalized mass variance at large enough scales [3T| ITT] : 

ij-(3+n) f or _ 3 < n < 1 

cr 2 (R) rsj { R-( 3 +V\ogR for n = 1 . (6) 

#-(3+i) for n > 1 

The case in which the volume is a Gaussian sphere can be misleading, see discussion in, e.g., |31) 



(MiR)) 1 V 2 (R) Jv R) Jv R) 
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For —3 < n < (which corresponds to £(r) ~ r -7 with 0<7 = 3 + n<3), mass 
fluctuations are super-Poisson. These are, for instance, typical of systems at the critical 
point of a second order phase transition |17j : there are long-range correlations and the 
correlation length r c is infinite. For n = fluctuations are Poisson-like and the system is 
called substantially Poisson: there are no correlations (i.e., a purely Poisson distribution) 
or correlations limited to small scales, i.e. of the type £(r) ~ exp(— r/r c ), with a finite 
r c . This behavior is typical of many common physical systems, e.g., a homogeneous 
gas at thermodynamic equilibrium at sufficiently high temperature. Finally for n > 1 
fluctuations are sub-P 'oisson or super-homogeneous (3TJ [T7] (or hyper- uniform [32]). In 
this case c 2 (R) presents the fastest possible decay for discrete or continuous distributions 
[3Tj and the two-point correlation function has to satisfy the following global constraint 

d 3 r£(r) = , (7) 



o 



(see for more details Sectj3]). Examples are provided by the one component plasma, 
a well-known system in statistical physics [31], and by a randomly shuffled lattice of 
particles [TTJ [35] . 

Note that any uniform stochastic process has to satisfy the following condition 

lim a 2 (R) = lim = — ^— / d? n j d 3 r 2 £(r 12 ) = (8) 

R->oo R^oo V 2 (R) Jv(R) Jv(R) 

which implies that the average density Pq, in the infinite volume limit, is a well defined 
concept, i.e. po > [17J. This is a weaker condition than that required by EqJTJ 

2.2. Spatially inhomogeneous distributions 

A distribution is spatially inhomogeneous in the ensemble (or in the infinite volume 
limit) sense if Ao — > oo. For statistically homogeneous distributions, from Eqj2j we find 
that the ensemble average density is po = 0. Thus unconditional properties are not well 
defined: if we consider a randomly placed finite volume in an infinite inhomogeneous 
distribution, it typically contains no points. Therefore only conditional properties are 
well defined, as for instance the average conditional density defined in EqJHJ 

For a statistically homogeneous and isotropic fractal structure (where all points 
are alike) the average conditional mass included in a spherical volume grows as 
(M(r)) p ~ r D : for D < 3, the average conditional density presents a scaling behavior 
of the type [17] 

<*)).- {J ffi~r". 9) 

so that lim^oo (p(r)) p = 0. The hypotheses underlying the derivation of the Central 
Limit Theorem are violated by the long-range character of spatial correlations, resulting 
in a PDF of fluctuations that does not follow the Gaussian function [TTJ [36] . On the 
contrary, the PDF typically displays "long tails" and some moments of the distribution 
may diverge |37J . 
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It is possible to introduce more complex inhomogeneous distributions than EqJHl for 
instance the multi-fractal distributions for which the scaling properties are not described 
by a single exponent, but they change in different spatial locations p2]. Another simple 
(and different !) example is given by a distribution in which the scaling exponent in 
Eqj9] depends on distance, i.e. D = D{r) < 3. 

3. Statistical properties of the standard model 

As discussed in the introduction, the important constraint that must be valid for 
any kind of matter density fluctuation field in the framework of FRW models, is 
represented by the condition of super-homogeneity, corresponding in cosmology to the 
so-called property of "scale-invariance" of the primordial fluctuations power spectrum 



invariance" is used to describe the class of distributions which are invariant with 
respect to scale transformations. For instance, a magnetic system at the critical point 
of transition between the paramagnetic and ferromagnetic phase, shows a two-point 
correlation function which decays as a non-integrable power law, i.e. £(r) ~ r~ 7 with 
< 7 < 3 (super-Poisson distribution in Eqj6]). The meaning of "scale-invariance" in 
the cosmological context is therefore completely different, referring to the property that 
the mass variance at the horizon scale be constant (see below) |31j . 

3.1. Basic Properties 

Matter distribution in cosmology is assumed to be a realization of a stationary stochastic 
point process that is also spatially uniform. In the early universe the homogeneity scale 
A is of the order of the inter-particle distance, and thus negligible, while it grows during 
the process of structure formation driven by gravitational clustering. The main property 
of primordial density fields in the early universe is that they are super-homogeneous, 
satisfying Eqj6]with n = 1. This latter property was firstly hypothesized in the seventies 
[29] 130] and it subsequently gained in importance with the advent of inflationary models 
in the eighties [3T] . 

In order to discuss this property, let us recall that the fluctuations in the early 
universe are taken to have Gaussian statistics and a certain PS. Since fluctuations 
are Gaussian, the knowledge of the PS gives a complete statistical description of the 
fluctuation field. In a FRW cosmology there is a fundamental characteristic length 
scale, the horizon scale -R#(t) that is simply the distance light can travel from the Big 
Bang singularity t — until any given time t in the evolution of the Universe. This 
scale linearly grows with time. Harrison [29] and Zeldovich [30] introduced the criterion 
that matter fluctuations have to satisfy on large enough scales. This is named the 

|| The PS of density fluctuations is P(k) = ^|<5 p (fc)| 2 ^, where 5 p (k) is the Fourier Transform of the 
normalized fluctuation field (p{r) — po)/po [31] . 




To avoid confusion, note that in statistical physics the term "scale 
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Harrison- Zeldovich criterion (H-Z); it can be written as [TT] 
a 2 (R = RH(t)) = constant. 



(10) 



This condition states that the mass variance at the horizon scale is constant: it can be 
expressed more conveniently in terms of the PS for which EqffO] is equivalent to assume 
P(k) ~ k (the H-Z PS) and that in a spatial hyper-surface u 2 (R) ~ R~ A [311 ITT] . 

3.2. Physical implications of super-homogeneity 

In order to illustrate the physical implications of the H-Z condition, one may 
consider the gravitational potential fluctuations 50(f), which are linked to the density 
fluctuations Sp(f) via the gravitational Poisson equation: V 2 <50(r) = 4irG5p(f) . From 
this equation, transformed into Fourier space, it follows that the PS of the gravitational 



o~ 2 (R) ~ \Pif,(k)k 3 \ k=R -i, is constant with k [3Tj . 

The H-Z condition is a consistency constraint in the framework of FRW 
cosmology. Indeed, the FRW is a cosmological solution for a perfectly spatially and 
statistically homogeneous universe, about which fluctuations represent inhomogeneous 
perturbations. If density fluctuations obey to a different condition than EqJTOl and 
thus n < 1 in EqJEl then the FRW description will always break down in the past or 
future, as the amplitude of the perturbations become arbitrarily large or small. Thus 
the super-homogeneous nature of primordial density field is a fundamental property 
independently on the nature of dark matter. This is a very strong condition to impose, 
and it excludes even Poisson processes (n = in Eqj6]) [31] for which fluctuations in 
gravitational potential diverge at large scales. 

3. 3. The two-point correlation function and super-homogeneity 

The super-homogeneity (or H-Z) condition corresponds to the limit condition expressed 
by EqUJ which represents another way to reformulate that lim^o P{k) = 0. This 
means that there is a fine tuned balance between small-scale positive correlations and 
large-scale negative anti-correlations [3T| ITT]. 

Various models of primordial density fields differ for the behavior of the PS at 
large wave-lengths which is determined by the specific properties hypothesized for the 
dark matter component. For example, in the Cold Dark Matter (CDM) scenario, where 
elementary non-baryonic dark matter particles have a small velocity dispersion, the PS 
decays as a power law P(k) ~ k~ 2 at large k. For Hot Dark Matter (HDM) models, 
where the velocity dispersion is large, the PS presents an exponential decay at large 
k. However at small k they both exhibit the H-Z tail P(k) ~ k which is indeed the 
common feature of all density fields compatible with FRW models. The scale r c k~ x 



potential fluctuations 

the equation P<f,(k) r 
to P<j,{k) oc k~ 3 , so 
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at which the PS shows the turnover from the linear to the decaying behavior is fixed to 
be the size of the horizon at the time of equality between matter and radiation |42j. 

Correspondingly, the correlation function £(r) of CDM (HDM) models (see FigJT]) 
presents the following behavior: it is positive at small scales (decaying as £(r) ~ r _1 
for CDM and being almost flat for HDM), it crosses zero at r c and then it is negative 
approaching zero as — r~ 4 (in the region corresponding to P(k) ~ k) [TT] . 

3.4- Baryonic acoustic oscillations 

Let us now mention the baryon acoustic oscillations (BAO) scale [38J. The physical 
description which gives rise to these oscillations is based on fluid mechanics and gravity: 
when the temperature of the plasma was hotter than ~ 10 3 K, photons were hot 
enough to ionize hydrogen so that baryons and photons can be described as a single 
fluid. Gravity attracts and compresses this fluid into the potential wells associated with 
the local density fluctuations. Photon pressure resists this compression and sets up 
acoustic oscillations in the fluid. Regions that have reached maximal compression by 
recombination become hotter and hence are now visible as local positive anisotropics 
in the cosmic microwave background radiation (CMBR), if the different A;— modes are 
assumed to have the same phase (which is the central hypothesis in this context). 

For our discussion, the principal point to note is that while k— oscillations are de- 
localized, the real space correlation function £(r) has a localized feature at the scale r bao 
corresponding to the frequency of oscillations in k space. This simply reflects that the 
Fourier Transform of a regularly oscillating function is a localized function. Formally 
the scale r bao corresponds to a scale where a derivative of £(r) is not continuous [TTj 139]. 

3. 5. Size of structures and characteristic scales 

In summary, there are three characteristic scales in LCDM-type models (see Fig{T]). The 
first is the homogeneity scale which depends on time Ao = Ao(t), the second is the scale 
r c where £ (r c ) = (that is roughly proportional to the scale signing an exponential decay 
of £(r)) which is fixed by the initial properties of the matter density field, which also 
determines the third scale r bao . When the homogeneity scale is smaller than r bao , r c , 
these two scales are substantially unchanged by gravitational dynamics as this is in the 
linear regime. The rate of growth of the homogeneity scale can be simply computed by 
using the linear perturbation analysis of a self-gravitating fluid in an expanding universe 
[25] . Given the initial amplitude of fluctuations and the assumed initial PS of matter 
density fluctuations, under typical assumptions one finds that \o(t now ) ~ 10 Mpc/h 

By characterizing the two-point correlation function of galaxy distribution we can 
identify three fundamental tests of standard models 0: 

% For the power-spectrum there are additional complications, related how galaxies are biased with 
respect to the underlying density field: see [40j[33l|4T] for further details. 
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Figure 1. Schematic behavior of the two-point correlation function for the LCDM 
case. At small scales r < vq rs 10 Mpc/h (where £(?*o) = 1) non-linear gravitational 
clustering has changed the initial shape of At larger scales £(r) has been only 

amplified by gravitational clustering in the linear regime. For 10 < r < r c « 120 
Mpc/h the correlation is positive and with small amplitude. At larger scales it is 
negative and characterized by the £(r) ~ — r -4 behavior. The location of ri, ao is fixed 
by cosmological parameters: in the example shown r& ao < r c as predicted by the 
"concordance model" [3]. 



• If the homogeneity scale Ao is much larger (i.e., a factor 5-10) than ~ 10 Mpc/h, 
then there is not enough time to form non-linear large scale structures in LCDM 
models [XT] . 

• If the the zero crossing scale of £(r) is much larger than ~ 100 Mpc/h then there 
is a problem in the description of the early universe physics. 

• A clear test of inflationary models is given by the detection of the negative part 
of the correlation function, i.e. the range of scales it behaves as £(r) ~ — r~ 4 : all 
models necessarily predict such a behavior El- 



4. Testing assumptions in the statistical methods 

A number of different statistics, determined by making a volume average in a finite 
sample, can be used to characterize a given distribution. In addition, each statistical 

+ In the same range of scales the PS is expected to be linear with the wave-number, i.e. P(k) ~ k. 
However selection effects may change the behavior of the PS to constant but not the functional behavior 

of £(»o gaiEnm. 
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quantity can be measured by using different estimators. For this reason we have to set 
up a strategy to attack the problem if a-priori we do not know which are the properties 
of the given finite sample distribution. In practice, to get the correct information from 
the data we have to reduce as much as possible the number of a-priori assumptions used 
the statistical methods. 

We limit our discussion to the case of interest, i.e. a set of N point particles (i.e. 
galaxies) in a volume V. The microscopic number density can be simply written as 
n(r) = Yli 5 3 (r — fl) , where S 3 (f) is the Dirac delta function. The statistical quantities 
defined in Sectf2jcan be rewritten in terms of the stochastic variable 



where yl identifies the coordinates of the center of the volume V. If the center yl coincides 
with a point particle position fl, then EqJTT] is a conditional quantity. Instead, if the 
center yl can be any point of space (occupied or not by a particle) then the statistics 
in EqJTT] is unconditional and it is useful to compute, for instance, the mass variance 
defined in EqJHJ 

For inhomogeneous distributions, unconditional properties are ill-defined (Sect 12]) 
and thus we firstly analyze conditional quantities to then pass, only when spatial 
homogeneity has been detected inside the given sample, to consider unconditional 
ones. Therefore, in what follows we take as volume V in EqJTT] a sphere of radius r 
centered in a distribution point particle, i.e., we consider the stochastic variable defined 
by the number of points in a sphere of radius r centered on the i th point of the given 
set, i.e. V = V(r;f\). The PDF P(N(r)) = P(N;r) of the variable iV;(r) (at fixed r) 
contains, in principle, information about moments of any order |43j. The first moment 
is the average conditional density and the second moment is the conditional variance 



However before considering the moments of the PDF we should study whether they 
represent statistically meaningful estimates. Indeed, in the determination of statistical 
properties through volume averages, one implicitly assumes that statistical quantities 
measured in different regions of the sample are stable, i.e., that fluctuations in different 
sub-regions are actually described by the same PDF. Instead, it may occur that 
measurements in different sub- regions show systematic (i.e., not statistical) differences, 
which depend, for instance, on the spatial position of the specific sub-regions. In such 
a case the considered statistic is not stationary in space and its whole-sample average 
value (i.e., any finite-sample estimation of the PDF moments) is not a meaningful 
descriptor. It is in this sense that it does not provide with a useful estimation of 
the ensemble average quantity. 

* When we take a spherical shell instead of a sphere, then we define a differential quantity instead of 
an integral one. 




(11) 



nu. 
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4-1- Self-averaging 

A simple test to determine whether there are systematic finite size effects affecting the 
statistical analysis in a given sample of linear size L consists in studying the PDF of 
iVj(r) in sub-samples of linear size I < L placed in different spatial regions of the sample 
identified by their center-points {Si, Sn}- When, at a given scale r < £, P(N(r),£; Si) 
is the same, modulo statistical fluctuations, in the different sub-samples, i.e., 



it is possible to consider whole sample average quantities. When determinations of 
P(N(r); £; Si) in different regions Si show systematic differences, then whole sample 
average quantities are ill defined. In general, this situation may occur because: (i) 
the lack of the property of translational invariance or (ii) the breaking of self-averaging 
property due to finite-size effects induced by large-scale structures/ voids (i.e., long-range 
correlated fluctuations). 

While the breaking of translational invariance imply the lack of self-averaging 
property the reverse is not true. For instance suppose that the distribution is spherically 
symmetric, with origin at r* and characterized by a smooth density profile, function of 
the distance from r* [15]. The average density in a certain volume V, depends on the 
distance of it from r*: there is thus a systematic effect and EqJT2]is not satisfied. On 
the other hand when a finite sample distribution is dominated by a single or by a few 
structures then, even though it is translational invariant in the infinite volume limit, a 
statistical quantity characterizing its properties in a finite sample can be substantially 
affected by finite size fluctuations. For instance, a systematic effect is present when 
the average (conditional) density largely differs when it is measured into two disjointed 
volumes placed at different distances from the relevant structures (i.e., fluctuations) in 
the sample. In a finite sample, if structures are large enough, the measurements may 
differ much more than a statistical scattering 0. That systematic effect sometimes is 
refereed to as cosmic variance [22] but that is more appropriately defined as breaking 
of self- averaging properties [UJ, as the concept of variance (which involves already 
the computation of an average quantity) maybe without statistical meaning in the 
circumstances described above |11| . In general, in the range of scales in which statistical 
quantities give sample-dependent results, then they do not represent fair estimations of 
asymptotic properties of the given distribution [IT] . 

4-2. Spatial homogeneity 

The self-averaging test (Eq ri2l) is the first one to understand whether a distribution is 
spatially homogeneous or not inside a given sample. As long as the PDF P(N, r) does 
not satisfy Eq[12]then the distribution is spatially inhomogeneous and the moments of 
the PDF are not useful estimators of the underlying statistical properties. Suppose that 

(j The determination of statistical errors in a finite volume is also biased by finite size effects [331 116) 




(12) 
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EqJT^I is found to be satisfied up to given scale r < L. Now we can ask the question: 
does the distribution become spatially homogeneous for r < L? 

As mentioned in SectEJ to answer to this question it is necessary to employ 
statistical quantities that do not require the assumption of spatial homogeneity, such as 
conditional ones p21 [TT]. Particularly the first moment of P(N, r) provides an estimation 
of the average conditional density defined in Eqj3l which can be simply written as 

1 A/ W AT 1 M W 



Ni(r) 



^=MwS% = MwS ni(r) - (13) 
We recall that A^(r) gives the number of points in a sphere of radius r centered on the 
jth p j n ^ anc } sum j s extended to the all M(r) points contained in the sample for 
which the sphere of radius r is fully enclosed in the sample volume (this quantity is r 
dependent because of geometrical constraints, see, e.g., [H]). Analogously to EqJTBI the 
estimator of the conditional variance can be written as 

= ~M(r) ? n * ^ ~ ' ^ 

In the range of scales where self-averaging properties are satisfied, one may study the 
scaling properties of n{r) p and of o~ p {r). As long as n{r) p presents a scaling behavior as 
a function of spatial separation r, as in Eqj9]with D < 3, the distribution is spatially 
inhomogeneous. When n(r) p w const, then this constant provides an estimation of the 
ensemble average density and the scale Ao, where the transition to a constant behavior 
occurs, marks the homogeneity scale. Only in this latter situation it is possible to study 
the correlation properties of weak amplitude fluctuations. This can be achieved by 
considering the function £(r) defined in EqJU 

4-3. The two-point correlation function 

Before proceeding, let us clarify some general properties of a generic statistical estimator 
which are particularly relevant for the two-point correlation function £(r). As mentioned 
above, in a finite sample of volume V we are only able to compute a statistical estimator 
Xy of an ensemble average quantity (X). The estimator is valid if 

limAV = (X). (15) 

If the ensemble average of the finite volume estimator satisfies 

(Xy) = (X) (16) 

the estimator is unbiased. When Eq{16]is not satisfied then there is a systematic offset 
which has to be carefully considered. Note that the violation of Eq|T2] implies that 
EqJT6]is not valid as well. Finally the variance of an estimator is <jy = (Xy ) — (Xy) 2 . 
The results given by an estimator must be discussed carefully considering its bias 
and its variance in any finite sample. A strategy to understand what is the effect 
of these features consists in changing the sample volume V and study finite size effects 
[T71 13"3"t ITTj . This is crucially important for the two-point correlation function £ (r) as 
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any estimator £(r) is generally biased, i.e. it does not satisfy EqJTB] [321 EH]- This 
occurs because the estimation of the sample density is biased when correlations extend 
over the whole sample size, or beyond it. Indeed, the most common estimator of the 
average density is 

n=y, (17) 

where N is the number of points in a sample of volume V. It is simple to show that its 
ensemble average value can be written as 



(n} = (n) (l + i J v dhi{r) 



Therefore only when £(r) = (i.e., for a Poisson distribution), Eq{T7] is an unbiased 
estimator of the ensemble average density: otherwise the bias is determined by the 
integral of the ensemble average correlation function over the volume V. 

The most simple estimator of £(r) is the Full-Shell (FS) estimator [33] that can be 
simply written, by following the definition given in EqJU as 



(n(r)) p 



£( r ) = ^ME - 1 , (19) 



n 



where (n(r)) p is the estimator of the conditional density in spherical shells rather than 
in spheres as for the case of EqJT3j Suppose that in a spherical sample of radius R s , to 
estimate the sample density, instead of EqJT7} we use the estimator 



R s 



n=—p-T (n(r)) p Airr 2 dr . (20) 
AnR 6 s Jo 

Then, by construction the estimator defined in EqJT9]must satisfies the following integral 
constraint 

rR s 

/ £{r)r 2 dr = . (21) 
Jo 

This condition is satisfied independently of the functional shape of the underlying 
correlation function Thus the integral constraint for the FS estimator does not 

simply introduce an offset, but it causes a change in the shape of £(r) for r — > R s . 
Other choices of the sample density estimator [331111] and/or of the correlation function 
introduce distortions similar to that in Eqj2~Tl 

In order to show the effect of the integral constraint for the FS estimator, let us 
rewrite the ensemble average value of the FS estimator (i.e., Eq JT9|) in terms of the 
ensemble average two-point correlation function 
l + £(r) 

6 (r)) = 5 V 1 • (22) 

By writing Eqj22]we assume that the stochastic noise is negligible, which, of course, is 
not a good approximation at any scale. However in this way we may be able to single 
out the effect of the integral constraint for the FS estimator. From Eq]22]it is clear that 
this estimator is biased, as it does not satisfy EqJT6]but only Eqfl5l 




Figure 2. Absolute value of the estimation of the correlation function of the LCDM 
model with the integral constraint described by Eq[22] The tick solid line represents 
the theoretical model. The zero crossing scale correspond to the cusp ( Adapted from 
[33]). 



As an illustrative example, let us now consider the case in which the theoretical 
£(r) is a given by the LCDM model. The (ensemble average) estimator given by Eql22l 
in spherical samples of different radius R s , is shown in FigfSJ One may notice that for 
R s > r c the zero point of £(r) remains stable, while when R s < r c it linearly grows 
with R s . The negative tail continues to be non-linearly distorted even when R s > r c . 
For instance, when R s rs 600 Mpc/h we are not able to detect the £(r) ~ — r~ 4 tail 
that becomes marginally visible only when R s > 1000 Mpc/h. Thus the stability of the 
zero-point crossing scale should be the first problem to be considered in the analysis of 
£(r), clearly, once spatial homogeneity has been already proved. 

5. Results in the data 

We briefly review the main results obtained by analyzing several samples of the Sloan 
Digital Sky Survey (SDSS) pH QH QU ESJ [15] and of the Two degree Field Galaxy 
Redshift Survey (2dFGRS) (13, [13J, E] . In both catalogues we selected, in the angular 
coordinates, a sky region such that (i) it does not overlap with the irregular edges of the 
survey mask and (ii) it covers a contiguous sky area. We computed the metric distance 
R(z; fl m , Q\) from the redshift z by using the cosmological parameters Q m = 0.25 and 
tt A = 0.75. 

The SDSS catalogue includes two different galaxy samples constructing by using 
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different selection criteria: the main-galaxy (MG) sample and the Luminous Red Galaxy 
(LRG) sample. In particular, the MG sample is a flux limited catalogue with apparent 
magnitude m r < 17.77 [16], while the LRG sample was constructed to be volume-limited 
(VL) [IT]. A sample is flux limited when it contains all galaxies brighter than a certain 
apparent flux f m in- There is an obvious selection effect in that it contains intrinsically 
faint objects only when these are located relatively close to the observer, while it contains 
intrinsically bright galaxies located in wide range of distances j6]. For this reason one 
constructs a volume limited (VL) sample by imposing a cut in absolute luminosity L min 
and by computing the corresponding cut in distance r max m J L min / (47r/ m j n ) , so that all 
galaxies with L > L min , located at distances r < r max , have flux / > / m j n , and are thus 
included in the sample. By choosing different cuts in absolute luminosity one obtains 
several VL samples (with different L min ,r max ). Note that we use magnitudes instead 
of luminosities and that the absolute magnitude must be computed from the redshift 
by taking into account both the assumptions on the cosmology (i.e. the cosmological 
parameters, which very weakly perturb the final results given the low redshifts involved, 
i.e., z < 0.2) and the K-corrections (which are measured in the SDSS case). 

For the MG sample we used standard K-corrections from the VAGC data [IS]: we 
have tested that our main results do not depend significantly on K-corrections and/or 
evolutionary corrections [UJ. The MG sample angular region we consider is limited, in 
the SDSS internal angular coordinates, by —33.5° < rj < 36.0° and —48.0° < A < 51.5°: 
the resulting solid angle is Q = 1.85 sr. For the LRG sample, we exclude redshifts 
z > 0.36 and z < 0.16 (where the catalogue is known be incomplete [lElE]), so that the 
distance limits are: R m in — 465 Mpc/h and R ma x — 1002 Mpc/h. The limits in R.A a 
and Dec. 5 considered are: a G [130°, 240°] and 5 G [0°,50°]. The absolute magnitude 
is constrained in the range M G [—23.2, —21.2]. With these limits we find N = 41833 
galaxies covering a solid angle Q = 1.471 sr [19]. Finally for 2dFGRS, to avoid the effect 
of the irregular edges of the survey we selected two rectangular regions whose limits 
are |T4]: in southern galactic cap (SGC) (-33° < 5 < -24°, -32° < a < 52°), and in 
northern galactic cap (NGC) (-4° < 5 < 2°, 150° < a < 210°); we determined absolute 
magnitudes M using K-corrections from [50l [T4"] . 

5.1. Redshift selection function 

In order to have a simple picture of the redshift distribution in a magnitude limited 
sample, we report FigfS] galaxy counts as a function of the radial distance, in bins of 
thickness 10 Mpc/h, in the northern and southern part of the 2dFGRS [T4"l fl~3]. One 
may notice that a sequence of structures and voids is clearly visible, but there is an 
overall trend ( peak and then a decrease of the density) which is determined 

by a luminosity selection effect. Indeed, n(R) in a flux limited sample is usually called 
redshift selection function, as it is determined by both the redshift distribution and 
by the luminosity selection criteria of the survey. It is thus not easy, by this kind of 
analysis, to determine, even at a first approximation, the main properties of the galaxy 
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Figure 3. Radial density per unit solid angle in bins of thickness 10 Mpc/h in the 
northern (NGC) and souther (SGC) part of the 2dFGRS magnitude limited sample. 
There is a large structure at ~ 240 Mpc/h. In the inset panel it is shown the 
distribution of Ni(r; R) for r = lOMpc/h in a VL sample in the NGC. (Adapted 
from [14]'). 

distribution in the samples. Nevertheless, one may readily compute that there is a 
~ 30% of difference in the sample density between the northern and the southern part 
of the catalogue: one needs to refine the analysis to clarify its significance. Note that 
large scale ~ 30% fluctuations are not uncommon. For instance, fluctuations have been 
found in galaxy redshift and magnitude counts that are close to 50% occurring on ~ 100 
Mpc/h scales [HI [191 1201 [21]. 

5.2. Radial counts 

A more direct information about the value of the density in a VL sample, is provided 
by the number counts of galaxies as a function of radial distance n(R) in a VL sample. 
For a spatially homogeneous distribution n(R) should be constant while, for a fractal 
distribution it should exhibit a power-law decay, even though large fluctuations are 
expected to occur given that this not an average quantity [51~] . 

In the SDSS MG VL samples, at small enough scales, n(R) (see the left panel of 
Figj5]) shows a fluctuating behavior with peaks corresponding to the main structures in 
the galaxy distribution [TT]. At larger scales n(R) increases by a factor 3 from R rj 300 
Mpc/h to R ~ 600 Mpc/h. Thus there is no range of scales where one may approximate 
n(R) with a constant behavior. The open question is whether the growth of n(R) for 



Inhomogeneities in the universe 



18 




-200 -100 100 200 300 

Figure 4. Two different slices of the same SDSS MG VL sample. Both slices cover 
an angular region of 100° x 10°, but in differnt directions. 



R > 300 Mpc/h is induced by structures and/or by observational selection effect in 
data: in principle, both are possible. For instance in [27] it is argued that a substantial 
galaxy evolution causes that growth, while in [TT] it is discussed that structures certainly 
contribute to the observed a behavior. (Note that in mock catalogues drawn from 
cosmological N-body simulations one measures an almost constant density [Tl| [T4]). 

Given that, by construction, also the LRG sample should be VL [53j [7J H] the 
behavior of n(R) is expected to be constant if galaxy distribution is close to uniform 
(up to Poisson noise and radial clustering). It is instead observed that the LRG sample 
n(R) shows an irregular and not constant behavior (see the right panel of Figj5]) rather 
different from that found in the MG sample. Indeed, there are two main features: (i) 
a negative slope between 400 Mpc/h < r < 800 Mpc/h (i.e., 0.16 < z < 0.28) and (ii) 
a positive slope up to a local peak at r ~ 950 Mpc/h (i.e., z ~ 0.34). Note that if n(R) 
were constant we would expect a behavior similar to the one shown by the mock sample 
extracted from the Horizon simulation [52] (see Figj5]) |4"9] . 

An explanation that it is usually given to interpret the behavior of n(R) [7J H], 
is that the LRG sample is "quasi" VL, precisely because it does not show a constant 
n(R). Thus, the unexpected trends and features of n(R) are absorbed in the properties 
of the so-called "the survey selection function", which is unknown a priori, but that 
is defined a posteriori as the difference between an almost constant n(R) and the 
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Figure 5. Left panel: Radial density in the volume limited samples of the MG 
catalogue. Note the amplitude of n(R) for the MG VL samples has been normalized 
by taking into account the different selection in luminosity in the different samples 
( Adapted from [H]). Right Panel: The same for the LRG sample and for a mock 
sample extracted from the Horizon simulations [S2] (units are in (Mpc/h)" 3 ). The 
blue dashed line decays as r _1 and it is plotted as reference. ( Adapted from 49 ). 



behavior observed. This explanation is unsatisfactory as it is given a posteriori and no 
independent tests have been provided to corroborate the hypothesis that an important 
observational selection effect occurs in the data, other than the behavior of n(R) itself. 
A different possibility is that the behavior of n(R) is determined, at least partially, by 
intrinsic fluctuations in the distribution of galaxies and not by selection effects. 

Note that, by addressing the behavior of n(R) to unknown selection effects, it is 
implicitly assumed that more than the 20% of the total galaxies have not be measured 
for observational problems [10]. This looks improbable [53] although a more careful 
investigation of the problem must be addressed. Note also that the deficit of galaxies 
would not be explained by a smooth redshift-dependent effect, rather the selection must 
be strongly redshift dependent as the behavior of n(R) is not monotonic. These facts 
point, but do not proof, toward an origin of the n(R) behavior due to the intrinsic 
fluctuations in the galaxy distribution. 

5.3. Test on self- averaging properties 

Galaxy counts provide only a rough analysis of fluctuations as one is unable to compute 
a truly volume average quantity. In addition galaxy counts sample different scales 
differently as the volume in the different redshift bins is not the same. The analysis of 
the stochastic variable represented by the number of points in spheres Ni(r) an help to 
overcome these problems, as it is possible to construct volume averages and because it 
is computed in a simple real sphere sphere. (See an example in the inset panel of FigH]). 

Let us thus pass to the self-averaging test described in Sect l4.ll To this aim we 
divide the sample into two non-overlapping regions of equal volume, one at low (L) and 
the other at high (H) redshifts. We then measure the PDF Pl(N;t) and P#(iV;r) in 
the two volumes ( see [15] for more details). Given that the number of independent 
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Figure 6. Upper Panels: PDF of the counts in spheres in the sample defined by 
R G [125,400] Mpc/h and M 6 [-20.5,-22.2] in the DR6 and DR7 data, for two 
different values of the sphere radii r — 10 Mpc/h and r — 80 Mpc/h. Lower Panels: 
The same but for the sample defined by R G [200, 600] Mpc/h and M £ [-21.6, -22.8] 
and for r = 20, 120 Mpc/h. (Adapted from [15]). 
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points is not very large at large scales (i.e., M(r) in EqfT3]not very larger than ~ 10 4 ), 
in order to improve the statistics especially for large sphere radii, we allow a partial 
overlapping between the two sub-samples, so that galaxies in the L (H) sub-sample 
count also galaxies in the H (L) sub-sample. This overlapping clearly can only smooth 
out differences between Pi(N;r) and Pff(iV;r). 

We first consider two SDSS MG VL samples from the data release 6 (DR6) [llj 
and then from the DR7 [15] . In a first case (upper - left panels of Fig 16]), at small 
scales (r = 10 Mpc/h), the distribution is self-averaging (i.e., the PDF is statistically 
the same) both in the DR6 sample (that covers a solid angle Qdrg — 0.94 sr) than 
in the DR7 sample {VLom = 1.85 sr w 2 x Qom sr). Instead, for larger sphere radii 
i.e., r = 80 Mpc/h, (bottom - right panels of Figj6]) in the DR6 sample, the two PDF 
show clearly a systematic difference. Not only the peaks do not coincide, but the overall 
shape of the PDF is not smooth displaying a different shape. Instead, for the sample 
extracted from DR7, the two determinations of the PDF are in good agreement (within 
statistical fluctuations). We conclude that in DR6 for r = 80 Mpc/h there are large 
density fluctuations which are not self-averaging because of the limited sample volume 
[TT1 [15] . They are instead self-averaging in DR7 because the volume is increased by a 
factor two. 

For the other sample we consider, which include mainly bright galaxies, the breaking 
of self-averaging properties occurs only for large r, both in the DR6 and in the 
DR7 samples. As mentioned above, radial distance- dependent selections, like galaxy 
evolution [27], could in principle give an effect in the same direction if they tend to 
increase the number density with redshift. However this would not change the main 
conclusion that, on large enough scales, self-averaging is broken. Note that in the SDSS 
samples for small values of r the PDF is found to be statistically stable in different 
sub-regions of a given sample. For this reason we do not interpret the lack of self- 
averaging properties as due to a "local hole" around us: this would affect all samples 
and all scales, which is indeed not the case [15J. Because of these large fluctuations 
in the galaxy density field, self- averaging properties are well-defined only in a limited 
range of scales where it is then statistically meaningful to measure whole-sample average 
quantities [PT | [36 ] H5]. 

For the LRG sample (see FiglTJ one may note that for r = 50 Mpc/h the 
determinations in the two are separate parts of the sample much closer than for lager 
sphere radii. Indeed, fro r > 100 Mpc/h there is actually a noticeable difference in the 
whole shape of the PDF. The fact that Pfj{N;r) is shifted toward smaller values than 
Pl(N;t) is related to the decaying behavior of the redshift counts (see Fig]5]): most 
of the galaxies at low redshifts see a relatively larger local density than the galaxies at 
higher redshift. 

In summary, due the breaking of self-averaging properties in the different samples 
for r < 150 Mpc/h we conclude that there is no evidence for a crossover to spatial 
uniformity. 
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Figure 7. Upper Left Panel: PDF for r = 50 Mpc/h in the LRG sample ( Adapted 
from [49]). 

5.4- Probability density function and its moments 

We can refine the analysis by characterizing the shape of the PDF and the scaling of its 
moments. In particular, in the range of scales where self-averaging properties are found 
to be satisfied, we can further characterize the shape of the PDF and the scaling of its 
moments, particularly the first moment, the behavior of the average conditional density 
(Eq jl~3|) whose behavior is presented in FigJHJ In brief, it decays approximately as r _1 
up to « 20 Mpc/h where the decay changes to n(r) ~ 0.011 x r -029 0. Moreover, 
the density n(r) does not saturate to up to ~ 100 Mpc/h, i.e., up to the largest scales 
probed in this sample where self-averaging properties have been tested to hold. In FigJH] 
it is also shown the behavior of n(r) into two non-overlapping regions of equal volume: 
these behaviors show the typical fluctuations affecting the estimation of this quantity. 

The scaling behavior of the conditional density implies that galaxy structures are 
characterized by non-trivial correlations for scales up to r « 100 Mpc/h, without a 
crossover towards spatial homogeneity. 

To probe the whole distribution of the conditional density ni(r), we fitted the 
measured PDF with Gumbel distribution via its two parameters a and (3 [36]. The 
Gumbel distribution is one of the three extreme value distribution [5~il 155] . It describes 
the distribution of the largest values of a random variable from a density function with 
faster than algebraic (say exponential) decay. The Gumbel distribution's PDF is given 

ft Alternatively, an almost indistinguishable fit is provided by a slow logarithmic one n(r) « ° 1 ° 1 ^ 3 

ES] 
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Figure 8. Conditional average density n(r) of galaxies as a function of radius 
Note the change of slope at « 20 Mpc/h and note also that there is no flattening up 
to ps 100 Mpc/h (in the inset panel it is shown a zoom at large scales). The statistical 
significance of the last few points at the largest scales is weaker (see text) . The behavior 
of n(r) in two non-overlapping and equal volume regions, named i?2 and i?3, is also 
plotted. 
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The mean and the variance of the Gumbel distribution (Eq ]23l is fi = a + 7/3, a 2 = 
(^7r) 2 /6 where 7 = 0.5772 ... is the Euler constant. 

One of our best fit for the PDF is obtained for r = 20 Mpc/h (see Fig. [9]). At 
larger scales the fit get worst, but the Gumbel function remains a good fit even for 
r = 110 Mpc/h. Given that the main source of uncertainty is, as discussed, finite 
volume systematic effects, it is not simple to determine the statistical significance of the 
Gumbel fit as systematic errors are larger than statistical ones. 

The fact that the PDF is clearly asymmetric, and well-fitted by a Gumbel function, 
provides an additional evidence that correlations are long-range. Indeed, due to 
the Central Limit Theorem, all homogeneous point distributions with short-range 
correlations lead to Gaussian fluctuations [TTj . It was recently conjectured [56] that 
only three types of distributions appear to describe fluctuations of global observables at 
criticality. In particular, when the global observable depends weakly on the system size 
(e.g., logarithmically), the corresponding distribution should be a (generalized) Gumbel 
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Figure 9. PDF for r = 20, 50, 80, 100 Mpc/h. The solid line corresponds to the best 
fit with a Gumbel distribution. 



5.5. Two-point correlation analysis 

When one determines the standard two-point correlation function one makes implicitly 
the assumptions that, inside a given sample the distribution is: (i) self- averaging and 
(ii) spatially uniform. The first assumption is used when one computes whole sample 
average quantities. The second is employed when supposing that the estimation of the 
sample density gives a fairly good estimation of the ensemble average density. When 
one of these assumptions, or both, is not verified then the interpretation of the results 
given by the determinations of the standard two-point correlation function must be 
reconsidered with great care. 

To show how non self-averaging fluctuations inside a given sample bias the £(r) 
analysis, we consider the estimator 

W) + 1 = Wi % Afl) + 1 = W^T P ■ N(r ^ AR) . ( 24 ) 

where the second ratio on the r.h.s. is now the density of points in spheres of radius r* 
averaged over the galaxies lying in a shell of thickness AR around the radial distance 
R. If the distribution is homogeneous, i.e., r* > A , and statistically stationary, Eqf24l 
should be (statistically) independent on the range of radial distances (R, AR) chosen. 
The two-point correlation function is defined as a ratio between the average conditional 
density and the sample average density: if both vary in the same way when the radial 
distance is changed, then its amplitude remains nearly constant. This however does 
not imply that the amplitude of £(r) is meaningful, as it can happen that the density 
estimated in sub- volumes of size r* show large fluctuations and so the average conditional 
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Figure 10. Left panel: The two-point correlation function in a MG-VL sample 
estimated by Eqj24] the sample average density is computed in spheres of radius 
r* = 60 Mpc/h and considering all center-points lying in a bin of thickness Ai? = 50 
Mpc/h cantered at different radial distance R: R\ — 250 Mpc/h (n\) and R2 = 350 
Mpc/h (rig). The case in which we have used the estimation of the sample average 
N/V (n s ) is also shown and it agrees with the FS estimator (adapted from [H]). Right 
Panel: The Landy and Szalay [ST] estimator of £(r) in various MG-VL sample and 
in a LRG sample of the SDSS. The most evident feature is the finite-size dependence 
of both the amplitude and the zero-crossing (adapted from [E]). The solid line is a 
LCDM model. 



density, and this occurring with a radial-distance dependence. The £(r) analysis gives 
a meaningful estimate of the amplitude of fluctuations, only if this amplitude remains 
stable by changing the relative position of the sub-volumes of size r* used to estimate 
the average conditional density and the sample average density. This is achieved by 
using the estimator in EqJ2U While standard estimators (57J EJ [33] are not able to test 
for such an effect, as the main contributions for both the conditional density and the 
sample average density come from the same part of the sample (typically the far-away 
part where the volume is larger). We find large variations in the amplitude of £(r) in the 
SDSS MG VL samples (see the left panel of FigJTOl). This is simply an artifact generated 
by the large density fluctuations on scales of the order of the sample sizes. The results 
that the estimator of £(r) has nearly the same amplitude in different samples, e.g., 
[58| EHJ EOl EU [3 El E], despite the large fluctuations of A^(r; R), are simply explained 
by the fact that £(r) is a ratio between the average conditional density and the sample 
average density: both vary in the same way when the radial distance is changed and 
thus the amplitude is nearly constant. 

In the right panel of Fig{10]it is plotted the behavior of £(r) in samples of different 
size. This clearly show that there is a finite-size dependence of both the amplitude of 
the correlation function and of the zero-crossing scale. Therefore the estimator of £(r) 
is biased by volume- dependent systematic effects that make the detection of correlation 
amplitude only an estimate of their lower limit [16] . A similar conclusion was reached 
by [02], i-e. that when corrections for possible systematics are taken into account the 
correlation function may not be consistent with as high amplitude a peak as claimed by 
[3]. To clarify this issue, as discussed above, it is necessary to consider the set of tests 
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for statistical and spatial homogeneity discussed above. 

Instead of investigating the origin of the fluctuating behavior of n(R), some authors 
[I] focused their attention on the effect of the radial counts on the determination of the 
two-point correlation function. In particular, they proposed mainly two different tests 
to study what is the effect of n(R) on the determination of The first test consists 
in taking a mock LRG sample, constructed from a cosmological N-body simulation of 
the LCDM model, and by applying a redshift selection which randomly excludes points 
in such a way that the resulting distribution has the same n(R) of the real sample. Then 
one can compare £(r) obtained in the original mock and in redshift-sampled mock. [I] 
find that there is a good agreement between the two. This shows that the particular 
kind of redshift-dependent random sampling considered for the given distribution, does 
not alter the determination of the correlation function. Alternatively we may conclude 
that, under the assumption that the observed LRG sample is a realization of a mock 
LCDM simulation, the n(R) does not affect the result. However, if we want to test 
whether the LRG sample has the same statistical properties of the mock catalogue, we 
cannot clearly proof (or disproof) this hypothesis by assuming a priori that this is true. 

In other words, standard analyses ask directly the question of whether the data are 
compatible with a given model, by considering only a few statistical measurements. As 
it was shown by [5] the LRG correlation function does not pass the null hypothesis, i.e. 
it are compatible with zero signal, implying that the volume of current galaxy samples 
is not large enough to claim that the BAO scale is detected. In addition, by assuming 
that the galaxy correlations are modelled by a LCDM model, one may find that the 
data allow to constrain the position of the BAO scale. In our view this approach is too 
narrow: in evaluating whether a model is consistent with the data, one should show that 
at least the main statistical properties of the model are indeed consistent with the data. 
As discussed above, a number of different properties can be considered, which are useful 
to test the assumptions of (i) self-averaging and (ii) spatial homogeneity. When, inside 
the given sample, the assumption (i) and/or (ii) are/is violated then the compatibility 
test of the data with a LCDM model is not consistent with the properties of the data 
themselves. 

6. Conclusion 

The statistical characterization of galaxy structures presents a number of subtle 
problems. These are associated both with the a-priori assumptions which are encoded 
in the statistical methods used in the measurements of galaxy correlations and in the 
a-posteriori hypotheses that are invoked to explain certain measured behaviors. These 
latter include for example, luminosity bias, galaxy evolution, observational selection 
effects, etc. Therefore it is necessary to introduce direct tests to understand both 
whether the a-priori assumptions are compatible with the data and whether it is justified 
to introduce a-posteriori untested, but plausible, hypotheses to interpret the results of 
the data analysis. For instance, the analysis of the simple counts as a function of 
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distance, in the SDSS samples, shows clearly that the observed behavior is incompatible 
with model predictions, i.e., spatial homogeneity. As mentioned above, one may assume 
that the differences between the model and the observations are due to selection effects. 
Then this becomes clearly the most important assumption in the data analysis that 
must be stressed clearly and explicitly. In addition, one must consider whether there 
is an independent way to study selection effects in the data. 

On the basis of the results have presented, aiming to directly test whether spatial 
and statistical homogeneity are verified inside the available samples we conclude that 
galaxy distribution is characterized by structures of large spatial extension. Given that 
we are unable to find a crossover towards homogeneity, the amplitude of these structures 
remain undetermined and their main characteristic is represented by the scaling behavior 
of their relevant statistical properties. In particular, we discussed that the average 
conditional density presents a scaling behavior of the type ~ r~ 7 with 7 ^ —1 up to 
~ 20 Mpc/h followed by a 7 —0.3 behavior up to ~ 100 Mpc/h. Correspondingly the 
probability density function (PDF) of galaxy (conditional) counts in spheres shows a 
relatively long tail: it is well fitted by the Gumbel function instead than by the Gaussian 
function, as it is generally expected for spatially homogeneous, short range correlated, 
density fields. 

The statistical tests introduced here can thus provide direct observational evidences, 
at small scales and low redshifts (when z <C 1 we can neglect the important 
complications of evolving observations onto a spatial surface for which we need a 
specific cosmological model) of the basic assumptions used in the derivation of the FRW 
models, i.e. spatial and statistical homogeneity. In this respect it is worthing to further 
clarify the subtle difference between these two concepts [15]. The concordance model 
of the universe combines three fundamental assumptions: (i) Einstein's field equations 
to determine the dynamics of space-time, (ii) Statistical homogeneity and isotropy, i.e., 
that "the Earth is not in a central, specially favored position" [Ml 165] . This requirement 
can be though to be the Copernican Principle which is a fundamental principle because 
one wants to avoid any special point or direction, (iii) Spatial homogeneity: this 
requirement is not a fundamental one as (ii) but plays the crucial role of simplifying the 
solutions of the Einstein's field equations. 

The Cosmological Principle is usually meant to include both the requirement of 
statistical homogeneity and isotropy and of spatial homogeneity: these assumptions 
are often simply summarized in the requirement that the universe is homogeneous and 
isotropic. However one must bear in mind the fact that the universe looks the same, 
at least in a statistical sense, in all directions and that all observers are alike does not 
imply spatial homogeneity of matter distribution. It is however this latter condition 
that allows us to treat, above a certain scale, the density field as a smooth function, a 
fundamental hypothesis used in the derivation of the FRW metric. 

We have shown that galaxy distribution in different samples of the SDSS is 
compatible with the assumption that this is transitionally invariant, i.e. it satisfies the 
requirement of the Copernican Principle that there are no spacial points or directions. 
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On the other hand, we found that there are no clear evidences of spatial homogeneity up 
to scales of the order of the samples sizes, i.e. ~ 100 Mpc/h. This implies that galaxy 
distribution is not compatible with the stronger assumption of spatial homogeneity, 
encoded in the Cosmological Principle. In addition, at the largest scales probed by these 
samples (i.e., r « 150 Mpc/h) we found evidences for the breaking of self-averaging 
properties, i.e. that the distribution is not statistically homogeneous. Forthcoming 
redshift surveys will allow us to clarify whether on such large scales galaxy distribution 
is still inhomogeneous but statistically stationary, or whether the evidences for the 
breaking of spatial translational invariance found in the SDSS samples were due to 
selection effects in the data. 

We note an interesting connection between spatial inhomogeneities and large 
scale flows which can be hypothesized by assuming that the gravitational fluctuations 
in the galaxy distribution reflect those in the whole matter distribution, and that 
peculiar velocities and accelerations are simply correlated. Peculiar velocities provide 
an important dynamical information as they are related to the large scale matter 
distribution. By studying their local amplitudes and directions, these velocities allow us, 
in principle, to probe deeper, or hidden part, of the Universe. The peculiar velocities are 
indeed directly sensitive to the total matter content, through its gravitational effects, and 
not only to the luminous matter distribution. However, their direct observation through 
distance measurements remains a difficult task. Recently, there have been published a 
growing number of observations of large-scale galaxy coherent motions which are at odds 
with standard cosmological models [681 EH EHJ [70] . 

It is possible to consider the PDF of gravitational force fluctuations generated by 
source field represented by galaxies, and test whether it converges to an asymptotic 
shape within sample volumes. In several SDSS sample we find that density fluctuations 
at the largest scales probed, i.e. r w 100 Mpc/h, still significantly contribute to the 
amplitude of the gravitational force [66]. Under the hypotheses mentioned above we 
may conclude that that large-scale fluctuations in the galaxy density field can be the 
source of the large scale flows recently observed. 

From the theoretical point of view, it is then necessary to understand how to treat 
inhomogeneities in the framework of General Relativity [65] EU [72j EH EH EHl [76j [77] • 
To this aim one needs to carefully consider the information that can be obtained from 
the data. At the moment it is not possible to get some statistical information for large 
redshifts (z ~ 1), but the characterization of relatively small scales properties (i.e., 
r < 200 Mpc/h) is getting more and more accurate. According to FRW models the 
linearity of Hubble law is a consequence of the homogeneity of the matter distribution. 
Modern data show a good linear Hubble law even for nearby galaxies (r < 10 Mpc/h). 
This raises the question of why the linear Hubble law is linear at scales where the visible 
matter is distributed in-homogeneously. Several solution to this apparent paradox have 
been proposed [731 EH1 ES] : this situation shows that already the small scale properties of 
galaxy distribution have a lot to say on the theoretical interpretation of their properties. 
Indeed, while observations of galaxy structures have given an impulse to the search 
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for more general solution of Einstein's equations than the Friedmann one, it is now a 
fascinating question whether such a more general framework may provide a different 
explanation to the various effects that, within the standard FRW model, have been 
interpreted as Dark Energy and Dark Matter. 
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